本文提供了具有固定步骤大小的线性随机近似(LSA)算法的有限时间分析,这是统计和机器学习中的核心方法。 LSA用于计算$ d $ - 二维线性系统的近似解决方案$ \ bar {\ mathbf {a}}} \ theta = \ bar {\ mathbf {b}} $ a}},\ bar {\ mathbf {b}})$只能通过(渐近)无偏见的观察来估算$ \ {(\ m athbf {a}(z_n),\ mathbf {b} {n \ in \ mathbb {n}} $。我们在这里考虑$ \ {z_n \} _ {n \ in \ mathbb {n}} $是i.i.d.序列或统一的几何千古马尔可夫链,并得出了$ p $ - 大小写的不等式和高概率界限,用于LSA及其polyak-ruppert平均版本定义的迭代。更确切地说,我们建立订单$(p \ alpha t _ {\ pereratatorName {mix}}}))^{1/2} d^{1/p} $在$ p $ - LSA的最后一个迭代的$ p $ - 。在此公式中,$ \ alpha $是该过程的步骤大小,$ t _ {\ operatatorName {mix}} $是基础链的混合时间($ t _ {\ operatotorname {mix {mix}} = 1 $ in I.I.D.设置中的1 $ )。然后,我们证明了迭代的polyak-ruppert平均序列上的有限时间实例依赖性边界。这些结果是明确的,从某种意义上说,我们获得的领先术语匹配局部渐近minimax限制,包括对参数$(d,t _ {\ operatorname {mix}})$的紧密依赖性在更高的术语中。
translated by 谷歌翻译
本文研究了用于训练过度参数化制度中的贝叶斯神经网络(BNN)的变异推理(VI),即当神经元的数量趋于无穷大时。更具体地说,我们考虑过度参数化的两层BNN,并指出平均VI训练中的关键问题。这个问题来自于证据(ELBO)的下限分解为两个术语:一个与模型的可能性函数相对应,第二个对应于kullback-leibler(KL)差异(KL)差异。特别是,我们从理论和经验上都表明,只有当根据观测值和神经元之间的比率适当地重新缩放KL时,在过度参数化制度中,这两个术语之间存在权衡。我们还通过数值实验来说明我们的理论结果,这些实验突出了该比率的关键选择。
translated by 谷歌翻译
个性化联合学习(FL)旨在协作学习每个客户的机器学习模型。尽管已经朝这个方向取得了希望,但大多数现有方法的工作都不允许进行不确定性量化,这在许多应用中至关重要。此外,跨设备设置中的个性化仍然涉及重要问题,尤其是对于新客户或少量观察的客户。本文旨在填补这些空白。为此,我们提出了一种新颖的方法,通过将个性化的FL重新铸造到人群建模范式中,其中客户的模型涉及固定的共同种群参数和随机效应,旨在解释数据异质性。为了获得我们的方案的融合保证,我们引入了一类新的联合随机优化算法,该算法依赖于马尔可夫链蒙特卡洛方法。与现有的个性化FL方法相比,所提出的方法具有重要的好处:对客户漂移是可靠的,对于推断新客户,最重要的是,在轻度的计算和内存开销中可以不确定性量化。我们为拟议算法提供非质合会收敛的保证,并说明了它们在各种个性化联合学习任务上的表现。
translated by 谷歌翻译
切片 - Wasserstein距离(SW)越来越多地用于机器学习应用,作为Wassersein距离的替代方案,并提供了显着的计算和统计效益。由于它被定义为随机投影的期望,因此SW通常由Monte Carlo近似。我们通过利用测量现象的浓度来采用新的视角来近似SW:在温和的假设下,高维随机向量的一维突起大致高斯。基于此观察,我们为SW开发了一个简单的确定性近似。我们的方法不需要采样许多随机投影,因此与通常的Monte Carlo近似相比,准确且易于使用。我们派生了我们的方法的非对应保证,并且显示近似误差随着数据分布的弱依赖条件下的弱依赖条件而变为零。我们验证了对合成数据集的理论发现,并说明了在生成建模问题上提出的近似。
translated by 谷歌翻译
自Venkatakrishnan等人的开创性工作以来。 2013年,即插即用(PNP)方法在贝叶斯成像中变得普遍存在。这些方法通过将显式似然函数与预定由图像去噪算法隐式定义的明确定义,导出用于成像中的逆问题的最小均方误差(MMSE)或最大后验误差(MAP)估计器。文献中提出的PNP算法主要不同于他们用于优化或采样的迭代方案。在优化方案的情况下,一些最近的作品能够保证收敛到一个定点,尽管不一定是地图估计。在采样方案的情况下,据我们所知,没有已知的收敛证明。关于潜在的贝叶斯模型和估算器是否具有明确定义,良好的良好,并且具有支持这些数值方案所需的基本规律性属性,还存在重要的开放性问题。为了解决这些限制,本文开发了用于对PNP前锋进行贝叶斯推断的理论,方法和可忽略的会聚算法。我们介绍了两个算法:1)PNP-ULA(未调整的Langevin算法),用于蒙特卡罗采样和MMSE推断; 2)PNP-SGD(随机梯度下降)用于MAP推理。利用Markov链的定量融合的最新结果,我们为这两种算法建立了详细的收敛保证,在现实假设下,在去噪运营商使用的现实假设下,特别注意基于深神经网络的遣散者。我们还表明这些算法大致瞄准了良好的决策理论上最佳的贝叶斯模型。所提出的算法在几种规范问题上证明了诸如图像去纹,染色和去噪,其中它们用于点估计以及不确定的可视化和量化。
translated by 谷歌翻译
在包括生成建模的各种机器学习应用中的两个概率措施中,已经证明了切片分歧的想法是成功的,并且包括计算两种测量的一维随机投影之间的“基地分歧”的预期值。然而,这种技术的拓扑,统计和计算后果尚未完整地确定。在本文中,我们的目标是弥合这种差距并导出切片概率分歧的各种理论特性。首先,我们表明切片保留了公制公理和分歧的弱连续性,这意味着切片分歧将共享相似的拓扑性质。然后,我们在基本发散属于积分概率度量类别的情况下精确结果。另一方面,我们在轻度条件下建立了切片分歧的样本复杂性并不依赖于问题尺寸。我们终于将一般结果应用于几个基地分歧,并说明了我们对合成和实际数据实验的理论。
translated by 谷歌翻译
This paper presents a methodology for integrating machine learning techniques into metaheuristics for solving combinatorial optimization problems. Namely, we propose a general machine learning framework for neighbor generation in metaheuristic search. We first define an efficient neighborhood structure constructed by applying a transformation to a selected subset of variables from the current solution. Then, the key of the proposed methodology is to generate promising neighbors by selecting a proper subset of variables that contains a descent of the objective in the solution space. To learn a good variable selection strategy, we formulate the problem as a classification task that exploits structural information from the characteristics of the problem and from high-quality solutions. We validate our methodology on two metaheuristic applications: a Tabu Search scheme for solving a Wireless Network Optimization problem and a Large Neighborhood Search heuristic for solving Mixed-Integer Programs. The experimental results show that our approach is able to achieve a satisfactory trade-off between the exploration of a larger solution space and the exploitation of high-quality solution regions on both applications.
translated by 谷歌翻译
Recent work has identified noisy and misannotated data as a core cause of hallucinations and unfaithful outputs in Natural Language Generation (NLG) tasks. Consequently, identifying and removing these examples is a key open challenge in creating reliable NLG systems. In this work, we introduce a framework to identify and remove low-quality training instances that lead to undesirable outputs, such as faithfulness errors in text summarization. We show that existing approaches for error tracing, such as gradient-based influence measures, do not perform reliably for detecting faithfulness errors in summarization. We overcome the drawbacks of existing error tracing methods through a new, contrast-based estimate that compares undesired generations to human-corrected outputs. Our proposed method can achieve a mean average precision of 0.91 across synthetic tasks with known ground truth and can achieve a two-fold reduction in hallucinations on a real entity hallucination evaluation on the NYT dataset.
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers increases. We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature. We study the convergence of the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a stochastic differential equation (SDE) or neither of these. In particular, our findings point to the existence of a diffusive regime in which the deep network limit is described by a class of stochastic differential equations (SDEs). Finally, we derive the corresponding scaling limits for the backpropagation dynamics.
translated by 谷歌翻译